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Abstract 

Structured and semi-structured data describing entities, tax- 
onomies and ontologies appears in many domains. Tiiere 
is a iiuge interest in integrating structured information from 
multiple sources; however integrating structured data to in- 
fer complex common structures is a difficult task because the 
integration must aggregate similar structures while avoiding 
structural inconsistencies that may appear when the data is 
combined. In this work, we study the integration of struc- 
tured social metadata: shallow personal hierarchies specified 
by many individual users on the Social Web, and focus on in- 
ferring a collection of integrated, consistent taxonomies. We 
frame this task as an optimization problem with structural 
constraints. We propose a new inference algorithm, which 
we refer to as Relational Affinity Propagation (RAP) that ex- 
tends affinity propagation (Frey and Dueck 2007) by intro- 
ducing structural constraints. We validate the approach on a 
real-world social media dataset, collected from the photoshar- 
ing website Flicki\ Our empirical results show that our pro- 
posed approach is able to construct deeper and denser struc- 
tures compared to an approach using only the standard affin- 
ity propagation algorithm. 

Introduction 

Structured and semi-structured data describing entities, re- 
lationships among entities, and taxonomies and ontologies 
over them, appear in many domains. There is a great deal of 
interest in integrating structured information from multiple 
sources. Some of the areas that have seen much active re- 
search include bioinformatics, aggregation services for com- 
mercial products and services, and more traditional enter- 
prise database integration. Integrating structured data to in- 
fer complex common structures is a difficult task because the 
integration must aggregate similar structures while avoiding 
structural inconsistencies that may appear when the data is 
combined. 

This problem becomes even more challenging when one 
is attempting to integrate numerous, heterogeneous metadata 
fragments, generated by multiple users. This data is inher- 
ently noisy and inconsistent, and there is certainly no single, 
unified structure to be found. On the other hand, finding and 
extracting the best exemplar or a set of good example struc- 
tures can be highly beneficial. 
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In folksonomy learning (Plangprasopchok and Lerman 
2009), structured metadata in the form of hierarchies of con- 
cepts created by many users on the Social Web is combined 
into a global hierarchy of concepts, that reflects how a com- 
munity organizes knowledge. Users who create personal 
hierarchies to organize their content may use idiosyncratic 
categorization schemes (Golder and Huberman 2006) and 
naming conventions. Simply combining nodes with simi- 
lar names is likely to lead to ill-structured graphs containing 
loops and shortcuts (multiple paths from one node to an- 
other), rather than a taxonomy. 

In this paper, we present a probabilistic approach for ag- 
gregating relational data into a desired structure. Specifi- 
cally, our task is to integrate many shallow personal hierar- 
chies, namely saplings, into a deeper, more complete taxon- 
omy. Our learning method, relational affinity propagation, 
extends affinity propagation (Frey and Dueck 2007) by in- 
troduces structural constraints that encourages the integra- 
tion process to combine saplings into trees rather than an 
arbitrary graph containing loops and shortcuts. We show 
that embedding the constraints into the hierarchy learning 
process results in a more accurate merging of saplings that 
leads to a more consistent tree. We demonstrate the utility of 
the proposed approach on real-world data extracted from the 
photosharing site Flickr. Specifically, we combine shallow 
personal hierarchies created by Flickr users into common 
deeper hierarchies of concepts. 

The objective of the optimization is to combine these 
small shallow hierarchies into a small number of, deeper and 
denser hierarchies that represent how a community of users 
organizes their knowledge. This differs from classical taxon- 
omy and ontology alignment settings (Euzenat and Shvaiko 
2007) where there are typically just a few structures to align, 
and those structures are large with rich and deep structure 
and semantics; here we focus on the much messier setting, 
where we have many small fragments, created by end users 
with a variety of purposes in mind. In these settings, com- 
ing up with a single integrated taxonomy is infeasible, so 
instead we focus on constructing a small number of useful 
taxonomies. 

We motivate our approach with an example of learning a 
common taxonomy of concepts from shallow personal hier- 
archies (saplings) created by many users and illustrate some 
of the challenges that arise during this task. We then briefly 




Figure 1: Illustrative examples on (a) a commonly shared 
conceptual categorization (hierarchy) by various users; (b) 
personal hierarchies expressed by the users based on the 
conceptual categorization as in (a). Nodes with similar name 
have similar color just for an illustrative purpose. 

describe how saplings are represented through social annota- 
tion on Flickr. We subsequently review the standard affinity 
propagation algorithm and describe our relational extension 
to it. Finally, we apply the method to real data sets and show 
that the proposed approach is able to learn better, and more 
complete trees. 

Motivating Example 

We take as our motivating example user-generated annota- 
tions on the Social Web. We assume that groups of users 
share common conceptualizations of the world, which can 
be represented as a taxonomy or hierarchy of concepts. Fig- 
ure 1(a) depicts one such common conceptualization about 
'animal' and its 'bird' subconcepts shared by a group of 
users. When users organize the content they create, e.g., 
photographs on Flickx, they select some portions of the com- 
mon taxonomy for categorization. We observe these cate- 
gories through the shallow personal hierarchies Flickr users 
create, which we refer to as saplings. Figure 1(b) depicts 
some of the saplings specified by different users to organize 
their 'animal' and 'bird' images. Our ultimate goal is to in- 
fer common conceptual hierarchies from the many individ- 
ual saplings. One natural solution is to aggregate saplings 
shown in Figure 1 (b) together into a deeper and bushier tree 
shown in Figure 1(a). 

To aggregate saplings, we need a combining strategy that 
measures the degree to which two sapling nodes should or 
should not be merged. Suppose that we have a very simple 
combining strategy that says two nodes are similar if they 
have similar names as in the prior work (Plangprasopchok 
and Lerman 2009). From Figure 1(b), we will end up with 
a graph containing one loop and two paths from 'animal' 
to 'bird', rather than the tree shown in Figure 1(a). Sup- 
pose that we can also access tags with which users annotated 
photos within saplings, and that photos within "domestic 
bird" nodes have tags like "pet," and "farmed" in common, 
and photos belonging to "wild bird" nodes have tags like 
"wildlife" and "forest" in common. A cleverer similarity 
function that, in addition to node names, takes tag statistics 
within a node into consideration, should split 'bird' nodes 
into two different groups: 'domestic bird' and 'wild bird', 
which are put under "pet" and "wildlife" nodes respectively. 

The similarity function plays a crucial part in integrating 
saplings, and a sophisticated enough similarity function that 



can differentiate node senses in detail, may potentially cor- 
rectly integrate the final tree. Nevertheless, finding and tun- 
ing such function is very difficult; moreover, the data is often 
inconsistent, noisy and incomplete, especially on the Social 
Web, where data is generated by many different users. 

One possible way to tackle this challenge is to use a sim- 
ple similarity function and incorporate constraints during the 
merging process. Intuitively, we would not consider merg- 
ing the 'bird' node under 'pet' with the one under 'wildlife' 
because it will result in multiple paths from 'animal'. These 
structural constraints are used during sapling aggregation 
process to ensure that the learned structure is a tree. Specif- 
ically, the constraints prevent two nodes from being merged 
if (1) this will lead to links from different parent concepts 
or (2) this will lead to an incoming link to the root node of 
a tree. These constraints guarantee that there is, at most, a 
single path from one node to another. 

Structured Metadata in Flickr 

Structured data in the form of shallow hiearchies is ubiq- 
uitous on the Social Web. On Flickr, users can arbitrarily 
group related photos into sets and then group related sets in 
collections. Some users create multi-level hierarchies con- 
taining collections of collections, etc., but the vast major- 
ity of users who use collections create shallow hierarchies, 
consisting of collections and their constituent sets. These 
personal hierarchies generally represent subclass and part- 
of relationships. 

We formally define a sapling as a shallow tree represent- 
ing a personal hierarchy which composed of a root node r* 
and its children, or leaf, nodes {l\,..^). The root node corre- 
sponds to a user's collection, and inherits its name, while the 
leaf nodes correspond to the collection's constituent sets and 
inherit their names. We assume that hierarchical relations 
between a root and its children, r* —> V,, specify broader- 
narrower relations. 

On Flickr, users can attach tags only to photos. A 
sapling's leaf node corresponds to a set of photos, and the 
tag statistics of the leaf are aggregated from that set's con- 
stituent photos. Tag statistics are then propagated from leaf 
nodes to the parent node. We define a tag statistic of node x 
as T^ := {(ii, /f J, it2,ft2), ■ ■ ■ (tk, /t J}, where tk and ft^ 
are tag and its frequency respectively. Hence, r^i is aggre- 
gated from all T;; s. These tag statistics can also be used as a 
feature for determining if two nodes are similar (of the same 
concept). 

Affinity Propagation 

A key component of folksonomy learning through sapling 
integration is the merging similar nodes in different saplings. 
Merging similar root nodes expands the width of the learned 
tree, while merging the leaf of one sapling to the root of an- 
other extends the depth of the learned tree. We cast the merg- 
ing process as clustering sapling nodes, and we use affinity 
propagation to perform the clustering. Below, we briefly re- 
view the original AP and then describe our extension, which 
incorporates structural constraints, and refer to the extended 
version as Relational Affinity Propagation (RAP). 



Affinity Propagation(AP) 

Affinity Propagation (Frey and Dueck 2007) is a clustering 
algorithm that identifies a set of exemplar points that are rep- 
resentative of all the points in the data set. The exemplars 
emerge as messages are passed between data points, with 
each point assigned to an exemplar. AP attempts to find the 
exemplar set which maximizes the net similarity, or the over- 
all sum of similarities between all exemplars and their data 
points. 

In this paper, we describe AP in terms of a factor 
graph (Kschischang, Frey, and Loeliger 2001) on binary 
variables, as recently introduced by Givoni and Frey (Givoni 
and Frey 2009). The model is comprised of a square matrix 
of binary variables, along with a set of factor nodes imposed 
on each row and column in the matrix. Following the nota- 
tions defined in the original paper (Givoni and Frey 2009), 
let Cij be a binary variable. Cij — 1 indicates that node 
i belongs to node j (or, j is an exemplar of i); otherwise, 
Cij = 0. Let A^ be a number of data points; consequently, 
the size of the matrix is N x N. 

There are two types of constraints that enforce cluster 
consistency. The first type, li, which is imposed on the row 
i, indicates that a data point can belong to only one exemplar 
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1). The second type, Ej, which is imposed on the 
column j, indicates that if a point other than j chooses j as 
its exemplar, then j must be its own exemplar (cjj = 1). 
AP avoids forming exemplars and assigning cluster mem- 
berships which violate these constraints. Particularly, if the 
configuration at row i violates / constraint, li will become 
— oo (and similarly for Ej). 

In addition to the constraints, there is a similarity func- 
tion S{.), which indicates how similar a certain node is, to 
its exemplar. If c^ — 1, then S{cij) is a similarity between 
nodes i and j; otherwise, S{cij) = 0. S{cjj) evaluates 
"self-similarity," also called "preference", which should be 
less than the maximum similarity value in order to avoid all 
singleton points becoming exemplars. This is because that 
configuration yields the highest net similarity. In general, 
the higher the value of the preference for a particular point, 
the more likely that point will become an exemplar. In ad- 
dition, we can set the same self-similarity value to all data 
points, which indicates that all points are equally likely to be 
formed as exemplars. 

A graphical model for affinity propagation is depicted in 
Figure 2, described in terms of a factor graph. In a log-form, 
the global objective function, which measures how good the 
present configuration (a set of exemplars and cluster assign- 
ments) is, can be written as a summation of all local factors 
as follows: 

S{cii,- ■ ■ ,Cnn) = y^ SijjCi]) + y^ lijCjl, ■ ■ ■ , Cjn) 

+ ^Ej{cij,--- ,cin)- (1) 

j 

That is, optimizing this objective function finds the configu- 
ration that maximizes the net similarity S, while not violat- 
ing / and E constraints. 





Figure 2: The original binary variable model for Affinity 
Propagation proposed by Givoni and Frey(Givoni and Frey 
2009): (a) a matrix of binary hidden variables (circles) and 
their factors(boxes); (b)incoming and outgoing messages of 
a hidden variable node from/to its associated factor nodes. 

The original work uses max-sum algorithm to optimize 
this global objective function, and it requires updating and 
passing five messages as shown in Figure 2(b). Since each 
hidden node Cij is a binary variable (two possible values), 
one can pass a scalar message — the difference between the 
messages when c^ = 1 and Cij — 0, instead of carrying two 
messages at a time. The equations to update these messages 
are described in greater detail in the Section 2 of the original 
work (Givoni and Frey 2009). 

Once the inference process terminates, the MAP config- 
uration (exemplars and their members) can be recovered as 
follows. First, identify an exemplar set by considering the 
sum of all incoming messages of each Cjj (each node in the 
diagonal of the variable matrix). If the sum is greater than 
(there is a higher probability that node j is an exemplar), j is 
an exemplar Once a set of exemplars K is recovered, each 
non-exemplar point i is assigned to the exemplar k if the sum 
of all incoming messages of Cik is the highest compared to 
the other exemplars. 

Relational Affinity Propagation(RAP) 

We extend the above algorithm to add in structural con- 
straints that will ensure that the learned folksonomy makes 
sense - no loops, and, to the extent possible, forms a taxon- 
omy. In fact, here, we require it to be a tree. Since we want 
the learned folksonomy to be a tree, all nodes assigned to 
some exemplar must have their incoming links from nodes 
in the same cluster, i.e., assigned to the same exemplar. To 
achieve this, we must enforce the following two constraints: 
(1) merging should not create incoming links to a cluster, 
or concept, from more than one parent cluster (single parent 
constraint); (2) merging should not create an incoming link 
to the root of the induced tree (no root parent constraint). 
For the second constraint, we can simply discard all sapling 
leaves that are named similar to the tree root. Hence, we 
only need to enforce the first constraint. The first constraint 
will be violated if leaf nodes of two saplings are merged, 
i.e., assigned to the same exemplar, while the root nodes of 
these saplings are assigned to different exemplars. Conse- 
quently, the leaf cluster will have multiple parents pointing 
to it, which leads to an undesirable configuration. 




Impose F constraint only 
at leaf-type nodes 




Figure 3: The Relational Affinity Propagation proposed in 
this paper: (a) a schematic diagram of the matrix of binary 
hidden variables(circles), and variables within the green 
shade corresponding to some leaf nodes; while those within 
the pink shade corresponding to some root nodes of some 
saplings, (b) incoming and outgoing messages of a hidden 
variable node from/to its associated factor nodes. There are 
two more messages g and t. Note that for (a), we omit E, I 
and S factors simply for the sake of clarity. 

Let pa(.) be a function that returns the index of the par- 
ent node of its argument, and explr{.) be a function that 
return the index of the argument's exemplar The factor F, 
"single parent constraint", checks the violation of multiple 
parent concepts pointing to a given concept. The constraint 
is formally defined as follows: 



Fj{cij, 



{— cxo 3i, k : Cij = l;ckj = 1; 
explr{pa{i)) ^ explr{pa{k)), 
otherwise. 

(2) 

Figure 3(a) illustrates the way we impose the new con- 
straint on the binary variable matrix. The configuration 
shown in the figure is valid since both C and D belong to 
the same exemplar E and their respective parents, A and 
B, belong to the same exemplar A. However, if cbb = 1, 
then the configuration is invalid, because parents of nodes 
in the cluster of exemplar E will belong to different exem- 
plars. This constraint is imposed only on leaf nodes, because 
merging root nodes will never lead to multiple parents. The 
global objective function for Relational Affinity Propagation 
is basically Eq. (1) plus ^ Fj{cij, • • ■ , cnj)- 

We modify the equations for updating the messages p, 
P and also derive <t and r to take into account this addi- 
tional constraint. Following the max-sum message update 
rule from a variable node to a factor node (cf., eq. 2.4 in 
Chapter 8 of (Bishop 2006)), the message update formulas 
for p, (3 and a are simply: 



p^j ^ s{i,j) + -ntj +■ 



IJ, 



(3) 



Pij = S{i, i) + aij + Tij , (4) 

atj = S{i, j) + aij + -qij . (5) 

For deriving the message update equation for r, we have 
to consider two cases: i — j and i ^ j, i.e., the t mes- 
sage to the nodes on the diagonal and r for the others. For 
simplicity, we also assume that all leaf nodes have their in- 
dex numbers less than any roots. Let L be a number of leaf 
nodes. Hence, leaf node indices run from 1 to L. 



For the case i = j (for the diagonal nodes Cjj), we have to 
consider the update message for t in two possible settings: 
= (or, they can be written as tjj{1) and 



Cjj = 1 and c 
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Tjj(O) respectively), and then find the best configuration for 
these settings. Following the max-sum message update rule 
from a factor node to a variable node (cf., eq. 2.5 in Chapter 
8 of (Bishop 2006), when Cjj = 1: 



^jj(l) = max{ 

s 



E 



<ykj{l) 



itS'-M3 



For Cjj = 0, we have 



k=l:L;k^j 



(6) 



(7) 



where S-' £ T; T D {1, • • • , L}; j e S-' and all k in S-' shares 
the same parent exemplar Eq. (6) will favor the "valid" 
configuration (the assignments of Ckj), which maximizes the 
summation of all incoming messages to the factor node Fj . 
For Eq. (7), since no other nodes can belong to j, the valid 
configuration is simply setting all Ckj to 0. Note that we omit 
Fj from the above equations since invalid configurations are 
not very optimal, so that they will never be chosen. Thus, 
Fj is always 0. 

From Eq. (6) and Eq. (7), the scalar message tjj is simply: 
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maxgj Y.keS';k^j"'''3 




(8) 



For i ^ j, we also have to consider two sub cases in the 
same way as to the previous setting. 



when c 



v 



1: 



T,;j(l) = maxj 



E ^'^■■'■(1) 



k£S^;k^i 



E ^'^(0)}- (9) 



For Cij — 0, we have 



,(0) 



maxj 

S 



<ykj{i) 



keS;k^i 



i^S-M 



mjiO)}, (10) 



where S G T;T D {I,---, L}, and all fc in S shares the same 
parent exemplar without the restriction that S must contain 
j. In particular, the best configuration may or may not have 
j as the exemplar, which is different from the Cij = 1 case 
that requires the best configuration necessarily having j as 
the exemplar. 

The scalar message r^, which is a difference between 
Ty (1) (Eq. (9)) and t,j(0) (Eq. (10)) is as follows: 



E 



CTfcj 



keS^;k^i 



E 



crij. 



(11) 



The inference of exemplars and cluster assignments starts 
by initializing all messages to zero and keeps updating all 
messages for each nodes iteratively until convergence. One 
possible way to determine the convergence is to monitor the 
stability of the net similarity value, J^i ,■ Sij{cij), as in the 
original AP 

Recovering MAP exemplars and cluster assignments can 
be done as in the original AP with one extra step, in order to 



guarantee that the final graph is in a tree form. In particular, 
for a certain exemplar, we sort its members by their message 
summation value in descending order Note that the higher 
the value, the more likely the node belongs to its exemplar. 
The parent exemplar of a cluster of nodes is determined as 
follows. If the exemplar of the cluster is a leaf node, the 
parent exemplar of the cluster is the parent exemplar of the 
exemplar Otherwise, the parent exemplar of the highest- 
ranked leaf node will be chosen. We then split all member 
nodes that have different parent exemplars to that of the clus- 
ter. Note that a more sophisticated approach to this task may 
be applied: e.g., once split, find the next best valid exem- 
plar to join. However, this more complex procedure is very 
cumbersome - the decision to re-join a certain cluster may 
recursively result in the invalidity of other clusters. 

Validation on a Toy Example 

To evaluate the utility of RAP, we first apply it to a simplified 
data set, which consists of a small fraction of the personal hi- 
erarchies taken from the Flickr data set described in (Plang- 
prasopchok and Lerman 2009). These hierarchies are about 
'animal', 'pet', 'wildlife', 'bird' (wild and domestic), which 
are very similar to Figure 1(b). The ideal integrated hierar- 
chy is similar to Figure 1(a), where 'bird' concept is split 
into domestic and wildlife birds under the 'animal' concept. 
There are total of 96 saplings generated by different users in 
this data set. 

We quantitatively compare the quality of the tree learned 
by RAP against that learned by the standard AP algorithm. 
In addition, since the AP does not have machinery for "cor- 
recting" the output graph into a tree, after the final inference 
step, we run the same procedure that recovers exemplars and 
valid cluster assignments that is used in the final step of RAP. 
We used the following evaluation metrics: net similarity and 
tree depth and bushiness. Intuitively, we prefer "a tree of ex- 
emplars," which clusters as many similar nodes as possible 
(high net similarity), as well as a comprehensive tree (bushy 
and deep). 

We used a simple similarity function, S{i,j), to compute 
the similarity between two nodes i and j. Let t*-' be a num- 
ber of common tags of i and j nodes. If i and j have the 
same stemmed name, S{i,j) = 171171(1.0, f^^) (if they have, 
at least, just one tag in common, the similarity value goes to 
1); otherwise, 0. The damping factor is set to 0.9, and the 
number of iterations is set to 4, 000. The preference is set to 
0.0001 uniformly. 

The inference converges in both approaches before 4, 000 
iterations and returns a single 'animal' tree. The net similar- 
ities of the trees (after correcting the graphs) are 83.41 (AP) 
and 106.31 (RAP). Both approaches return trees of similar 
depth, namely 3. The distribution of the number of exem- 
plars at depths (0, 1, 2, 3) for AP and RAP are (1, 25, 30, 9) 
and (1,2,32,12) respectively, and the distribution of the 
number of instances (data points) at depths (0, 1, 2, 3) are 
(35,78,46,10) for AP, and (35,81,51,13) for RAP Al- 
though AP yields a "bushier" tree of exemplars, this does 
not really demonstrate its superiority to RAP. In fact, at the 
depth 1, AP shatters the 'pet' concept into many singleton 




(a) AP's tree 




Figure 4: trees induced by AP(a) and RAP(b). The numbers 
in brackets just indicate the index numbers of the exemplar 
nodes of their clusters. Non-exemplar nodes are collapsed 
to one of these exemplars. The nodes without number are 
those that cluster all nodes with the same name together. 

clusters; while RAP nicely merges them into a single clus- 
ter. The distribution of the number of instances at different 
depths also indicates that RAP can aggregate more nodes 
into a tree, compared to AP. Hence, RAP's overall quaUty 
on this small example is higher 

The trees learned by both methods are shown in Figure 4. 
AP clusters both wild and domestic 'bird' nodes together in 
cluster number 50 as illustrated in Figure 4(a), while, RAP 
separates them into two different clusters: cluster number 16 
(wild bird) and cluster number 55 (domestic bird). By tak- 
ing structural constraints into account, RAP is able to sepa- 
rate 'bird' into different senses. Specifically, during the in- 
ference process this constraint prevents all bird nodes from 
being clustered together, because this will create paths to 
this cluster from two different parent clusters: 'wildlife' and 
'pet' . The only valid configuration then is to have two 'bird' 
clusters: one under 'wildlife' and one under 'pet'. The in- 
ference process optimizes the tree within this valid configu- 
ration. AP, on the other hand, does not have this machinery 
and, consequently, merges all 'bird' nodes together without 
concern for where the incoming links come from. The final 
correction step does not optimize the tree structure, and the 
tree learned by AP is worse than one learned by RAP. 

Validation on Real-Wold Data 

We also compared RAP against AP on the data collected 
from Flickr (Plangprasopchok and Lerman 2009). We man- 
ually selected 15 seed terms, and for each term used the fol- 
lowing heuristic to obtain "relevant" saplings. First, we se- 
lected saplings whose root names were similar to the seed 
term. We then used the leaf node names of these saplings to 
select other saplings whose root names were similar to these 
names, and so on, for the total of two iterations. We used 
the settings described above but with the number of itera- 



tions limited to 2000. 

For each seed, we ran AP and RAP on all extracted 
saplings; then measured the net similarity of the induced 
tree. For measuring the induced tree's structure in terms 
of bushiness and depth, we introduce a simple, yet intu- 
itive measure, namely Area Under Tree(AUT), which takes 
both tree bushiness and depth into account. To calculate 
AUT for a given tree, we plot the distribution of the num- 
ber of nodes at each level and then compute the area un- 
der the plot. Intuitively, trees that keep branching out at 
each level will get a larger AUT than those that short and 
thin. Suppose that we have a tree in which the number of 
nodes at 1"* and 2'"'^ level are 3 and 4, respectively. With 
the scale of tree depth set to 1.0, AUT of this tree would be 
0.5 X (1 + 3) + 0.5 X (3 + 4) = 5.5. 
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AP 


RAP 


AP 
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AP 
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africa 


46.93 


103.92 


2 


2 


25.41 


63.31 


55 


103 


animal 


3609.32 


3839.88 


4 


4 


142.48 


156.28 


606 


727 


asia 


474.06 


617.15 


2 


2 


143.75 


219.86 


445.5 


523.5 


australia 


167.65 


227.24 


2 


2 


72.11 


123.41 


104 


151 


bird 


231.73 


365.02 


3 


3 


31.11 


33.31 


75 


71.5 


Canada 


278.7 


312.29 


2 


2 


46.32 


50.62 


127 


138 


craft 


235.65 


285.75 


6 


6 


24.31 


24.41 


72.5 


67.5 


fish 


124.83 


132.43 


1 


1 


44.61 


45.71 


69 


68.5 


insect 


244.69 


257.19 


31 


31 


34.81 


37.1 


66.5 


63 


invertebrate 


34.01 


56.2 


1 


1 


32.91 


55.11 


97.5 


99.5 


mammal 


97.86 


117.25 


2 


2 


50.71 


36.71 


58.5 


65 


plant 


565.72 


714.7 


2 


2 


12.3 


13.4 


19.5 


19 


sport 


725.81 


758.2 


9 


9 


92.85 


105.74 


269 


252.5 


uk 


640.48 


754.16 


1 


1 


370.84 


526.7 


633 


673.5 


usa 


286.28 


390.56 


1 


1 


244.29 


341.99 


354.5 


370 



Table 1: The table presents the performance comparisons 
between AP and RAP by the net similarity for the entire data 
and the best induced tree (with highest net similarity) on 15 
different seed sets. The number of induced trees and Area 
Under Tree (AUT) are also reported. 

Both quantitative and manual inspections confirm the ad- 
vantages of RAP over AP. As shown in Table , RAP yields 
better net similarity in all cases. Although both approaches 
return the same number of trees, RAP appears to better clus- 
ter similar nodes in all but one case, namely 'mammal'. In 
terms of AUT, though many trees are of similar quality, in 
cases where significant differences exist, they are in RAP's 
favor. Manual inspection reveals that AP tends to "shatter" 
trees into isolated singletons rather than merge similar nodes 
together, as RAP does. 

Related Work 

Affinity propagation has been applied to many clustering 
problems, e.g. segmentation in computer vision (Lazic et 
al. 2009). It provides a natural way to incorporate con- 
straints while simultaneously improving the net similarity of 
the cluster assignments, which is not trivial to handle in stan- 
dard clustering techniques. In addition, no strong assump- 
tion is required on the threshold, which determines whether 
clusters should be merged or not. Moreover, the cluster as- 
signments can be changed during the inference process as 



suggested by the emergence of exemplars. Nevertheless, 
to our knowledge, there is no extension of AP algorithm to 
learn tree structures from many sparse and shallow trees as 
presented in this work. 

There are many other SRL approaches that are appli- 
cable as well. For example, Markov Logic Networks 
(MLN) (Richardson and Domingos 2006), a generic frame- 
work for solving probabilistic inference problems, may 
also be applied to folksonomy learning, by translating 
similarity function as well as constraints into predicates. 
Since our similarity function is continuous, hybrid MLN 
(HMLN) (Wang and Domingos 2008) would be required. 
The AP framework has advantages due to its simplicity; 
however we plan to investigate more comparative work with 
existing SRL approaches, especially as we explore more 
complex similarity functions. 

Discussion and Conclusion 

In this paper, we introduce relational affinity propagation 
(RAP), an extension to affinity propagation to learn struc- 
tures from data by incorporating structural constraints. RAP 
optimizes the net similarity and it uses the structural con- 
straint to find good solutions within a space of "valid" so- 
lutions. Thus, the final net similarity of the tree learned by 
RAP is better than AP. Our validations on toy and real-world 
data support this claim. For the future work, we would like 
to apply more sophisticated similarity function, which uti- 
lizes class labels as in a collective relational clustering ap- 
proach in order to improve the quality of the learned struc- 
tures. In addition, the structural constraints can be modified 
to guide RAP to induce other classes of graphs, e.g., a DAG. 
We would also like to extend RAP to apply on other struc- 
ture learning problems. 
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